AITopics | probability measure

Collaborating Authors

probability measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy

Mustafi, Aratrika, Mukherjee, Soumya

arXiv.org Machine LearningMay-25-2026

We consider the problem of sampling from an unnormalized Boltzmann/ Gibbs density, π(θ) exp V(θ),θ Θ Rd, where the normalization constant is unknown (and/or intractable) and only the potential function V (and typically its derivatives) can be evaluated. This problem arises across various domains in Bayesian inference, statistical physics, and modern machine learning. A common variational perspective on sampling is to characterize the target distribution π as the unique minimizer of a functional (typically a divergence functional) over the space of probability measures. From this viewpoint, sampling can be formulated as evolving an initial distribution ρ0 toward π via the gradient flow of this functional under a suitable geometric structure on the space of probability measures. In this paper, we focus on a gradient flow based sampling methodology built from the spherical Hellinger Kantorovich (SHK), also known as the Wasserstein Fisher Rao (WFR), geometry on the space of probability measures (Kondratyev and Vorotnikov, 2019; Liero et al., 2018; Chizat et al., 2015). When the variational objective is the exclusive KL divergence ρ 7 KL(ρ π), the SHK gradient flow generates a time-indexed family of marginals {ρt}t 0 (initialized at ρ0 P2(Θ)) that evolves according to the continuity reaction equation (4). This evolution is equivalent to the birth-death Langevin dynamics introduced in Lu et al. (2019) .

artificial intelligence, exp, machine learning, (20 more...)

arXiv.org Machine Learning

2605.23879

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Sliced-Regularized Optimal Transport

Nguyen, Khai

arXiv.org Machine LearningMay-21-2026

We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2604.23944

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

Lin, Likun, Wang, Zhongjian, Xin, Jack, Zhang, Zhiwen

arXiv.org Machine LearningMay-21-2026

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

machine learning, natural language, target measure, (20 more...)

arXiv.org Machine Learning

2605.21388

Country: North America > United States > Rhode Island (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Jacobs, Peter Matthew, Phillips, Jeff M.

arXiv.org Machine LearningMay-20-2026

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2605.20122

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families

Lee, HakGeun, Chun, Hyonho

arXiv.org Machine LearningMay-13-2026

This paper studies the identifiability and stability of drifting fields within the framework of Generative Modeling via Drifting. The motivating question is whether a zero-drift equilibrium identifies the target distribution, and whether an approximate zero drift implies weak distributional convergence. Since the original drifting model employs the Laplace kernel by default, we first analyze why standard Gaussian score-based arguments fail to apply. This analysis motivates the introduction of companion-elliptic kernel families, which are characterized by a companion potential satisfying an elliptic closure relation. We show that this class naturally contains the Laplace kernel and consists precisely of Gaussian and Matérn kernels with smoothness parameter $ν\ge 1/2$. Within this class, we establish field identifiability for arbitrary Borel probability measures on $\mathbb{R}^d$: if the drifting field vanishes identically, then the two measures must coincide. As for stability, we demonstrate that field convergence alone does not guarantee weak convergence, since mass may escape to infinity while remaining invisible to the field. Although tightness of the sequence directly removes this obstruction and restores weak stability, we prove that, even without tightness, every $C_0$-vague cluster point lies exactly on the defect ray $\{cp:0\le c\le1\}$. Consequently, a single scalar $C_0$-observable suffices to detect the missing mass and recover weak convergence.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Machine Learning

2604.24196

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Differentially Private Sampling from Distributions via Wasserstein Projection

Takakura, Shokichi, Liew, Seng Pei, Hasegawa, Satoshi

arXiv.org Machine LearningMay-12-2026

In this paper, we study the problem of sampling from a distribution under the constraint of differential privacy (DP). Prior works measure the utility of DP sampling with density ratio-based measures such as KL divergence. However, such formulations suffer from two key limitations: 1) they fail to capture the geometric structure of the support, and 2) they are not applicable when the supports of the distributions differ. To deal with these issues, we develop a novel framework for DP sampling with Wasserstein distance as the utility measure. In this formulation, we propose Wasserstein Projection Mechanism (WPM), a minimax optimal mechanism based on Wasserstein projection. Furthermore, we develop efficient algorithms for computing the proposed mechanisms approximately and provide convergence guarantees.

machine learning, mechanism, natural language, (17 more...)

arXiv.org Machine Learning

2605.10015

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

Ratio-based Loss Functions

Helgerth, Lena, Christmann, Andreas

arXiv.org Machine LearningMay-8-2026

Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the specified algorithm. This paper gives a survey of a certain class of loss functions, which we call ratio-based. In supervised learning, margin-based loss functions for classification tasks depending on the product of the output values $y_i$ and the predictions $f(x_i)$ as well as distance-based loss functions depending on the difference of $y_i$ and $f(x_i)$ for regression are common. Distance-based loss functions are in particular useful, if an additive model assumption seems plausible, i.e. the common signal plus noise assumption. However, in the literature, several loss functions proposed for regression purposes have a multiplicative error structure in mind and pay attention to relative errors, i.e. to the ratio of $y_i$ and $f(x_i)$. In this survey article, we systematically investigate such ratio-based loss functions and propose a few new losses, which may be interesting for future research. We concentrate on investigating general properties of ratio-based loss functions like continuity, Lipschitz-continuity, convexity, and differentiability, because these properties play a central role in most machine learning algorithms. Therefore, we do not focus on some specific machine learning algorithm to derive universal consistency, learning rates, or stability results. Instead, we want to enable future research in this direction.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Machine Learning

2605.05808

Country: Europe (0.28)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Gaussian mixture models in Hilbert spaces via kernel methods

López-Montero, Daniel, Álvarez-López, Antonio, Matabuena, Marcos

arXiv.org Machine LearningMay-8-2026

Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings, characterizing probability measures, for example, through densities, can be ill-defined or technically challenging. Motivated by clustering applications, we propose a Gaussian mixture framework for Hilbert-space-valued data based on kernel mean embeddings and develop efficient optimization algorithms for estimation. We establish theoretical guarantees showing that the proposed algorithm is well defined and that the model yields a dense class of approximations in infinite-dimensional spaces. We evaluate the framework through extensive experiments on diverse structures and data geometries, including $L^2$-functional data and random graphs in Laplacian spaces arising in modern medical applications.

artificial intelligence, hilbert space, machine learning, (19 more...)

arXiv.org Machine Learning

2605.05996

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.40)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (0.93)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Entropic Riemannian Neural Optimal Transport

Micheli, Alessandro, Sapora, Silvia, Monod, Anthea, Bhatt, Samir

arXiv.org Machine LearningMay-7-2026

Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting optimal transport (OT) problem. Existing manifold OT methods have pursued amortized out-of-sample maps, while entropic regularization has made discrete OT more scalable, but these advantages have remained largely disjoint. We propose Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a unified framework that combines intrinsic entropic OT with amortized out-of-sample evaluation on Riemannian manifolds. Our method learns a single target-side Schrödinger potential through a neural pullback parameterization, recovers the induced Gibbs coupling, and uses the resulting conditional laws to construct intrinsic transport surrogates. These include barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditional surrogates on stochastically complete manifolds, the latter turning possibly atomic target laws into absolutely continuous ones. For fixed regularization $\varepsilon>0$, we prove that the proposed hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. As consequences, barycentric surrogates converge in $L^2$, while heat-smoothed surrogates are stable at fixed heat time and asymptotically unbiased as the heat time vanishes. The guarantees hold for compactly supported data on possibly noncompact manifolds. Empirically, our method matches or improves over Euclidean, tangent-space, and log-Euclidean baselines on benchmarks over $\mathbb{S}^2$, $\mathrm{SO}(3)$, $\mathrm{SPD}(3)$, $\mathrm{SE}(3)$, and $\mathbb{H}^2$, scales favorably relative to discrete manifold Sinkhorn, and in a protein-ligand docking application, refines poses on $\mathrm{SE}(3)$ without retraining or per-instance optimization.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Machine Learning

2605.04255

Country: Europe > United Kingdom > England (0.92)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Strategic Distribution Shift of Interacting Agents via Coupled Gradient Flows

Neural Information Processing SystemsApr-29-2026, 00:32:55 GMT

We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Add feedback